For this assignment you are welcomed to use other regex resources such a regex "cheat sheets" you find on the web.
Before start working on the problems, here is a small example to help you understand how to write your own answers. In short, the solution should be written within the function body given, and the final result should be returned. Then the autograder will try to call the function and validate your returned result accordingly.
def example_word_count():
# This example question requires counting words in the example_string below.
example_string = "Amy is 5 years old"
# YOUR CODE HERE.
# You should write your solution here, and return your result, you can comment out or delete the
# NotImplementedError below.
result = example_string.split(" ")
return len(result)
#raise NotImplementedError()
Find a list of all of the names in the following string using regex.
import re
def names():
simple_string = """Amy is 5 years old, and her sister Mary is 2 years old.
Ruth and Peter, their parents, have 3 kids."""
# YOUR CODE HERE
pattern = '[A-Z][a-z]*'
result = re.findall(pattern,simple_string)
return result
assert len(names()) == 4, "There are four names in the simple_string"
The dataset file in assets/grades.txt contains a line separated list of people with their grade in a class. Create a regex to generate a list of just those students who received a B in the course.
import re
def grades():
with open ("assets/grades.txt", "r") as file:
grades = file.read()
# YOUR CODE HERE
pattern = '([A-Z][a-z]* [A-Z][a-z]*): B'
b_list = re.findall(pattern, grades)
string_list = ' '.join(b_list)
pattern2 = '([A-Z][a-z]* [A-Z][a-z]*)'
names_b_list = re.findall(pattern2, string_list)
return names_b_list
assert len(grades()) == 16
Consider the standard web log file in assets/logdata.txt. This file records the access a user makes when visiting a web page (like this one!). Each line of the log has the following items:
Your task is to convert this into a list of dictionaries, where each dictionary looks like the following:
example_dict = {"host":"146.204.224.152",
"user_name":"feest6811",
"time":"21/Jun/2019:15:45:24 -0700",
"request":"POST /incentivize HTTP/1.1"}
import re
def logs():
with open("assets/logdata.txt", "r") as file:
logdata = file.read()
# YOUR CODE HERE
save = []
pattern = '(?P<host>(?:\d+\.){3}\d+)\s+(?:\S+)\s+(?P<user_name>\S+)\s+\[(?P<time>[-+\w\s:/]+)\]\s+"(?P<request>.+?.+?)"'
for item in re.finditer(pattern, logdata):
save.append(item.groupdict())
return save
assert len(logs()) == 979
one_item={'host': '146.204.224.152',
'user_name': 'feest6811',
'time': '21/Jun/2019:15:45:24 -0700',
'request': 'POST /incentivize HTTP/1.1'}
assert one_item in logs(), "Sorry, this item should be in the log results, check your formating"